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Social networks transmitting covert or sensitive information cannot use all ties for this purpose. 
Rather, they can only use a subset of ties that are strong enough to be "trusted". In this paper 
we consider transitivity as evidence of strong ties, requiring that each tie can only be used if the 
individuals on either end also share at least one other contact in common. We examine the effect of 
removing all non-transitive ties in two real social network data sets. We observe that although some 
individuals become disconnected, a giant connected component remains, with an average shortest 
path only slightly longer than that of the original network. We also evaluate the cost of forming 
transitive ties by deriving the conditions for the emergence and the size of the giant component in 
a random graph composed entirely of closed triads and the equivalent Erdos-Renyi random graph. 
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I. INTRODUCTION 

The strength of weak ties is the concept that individ- 
uals tend to be more successful in acquiring information 
about job opportunities by contacting individuals that 
they did not see often — their weak ties [5j ■ The rationale 
behind this idea is that close friends tend to have similar 
information to us because they share similar interests, 
profession, or geographical location. Weak ties on the 
other hand are between individuals who don't have much 
in common, including other contacts, and the informa- 
tion they have access to will tend be different. A shared 
contact between two individuals forms a closed triad (tri- 
angle), where all three people know one another. Strong 
ties are usually parts of triads because good friends or 
close professional contacts of one person will tend to know 
one another. In this paper we make the simplifying as- 
sumption that 'weak ties' are those that are not part of 
any closed triad and we define 'strong ties' are the ones 
that share at least one other contact in common. In other 
contexts the strength of the tie may include measures 
such as frequency or length of contact, but for simplicity 
here we consider only the presence of closed triads. 

While weak ties may be preferred in acquiring job in- 
formation, one may be interested in assembling a team 
or otherwise gathering information that is distributed in 
different parts of a social network using only strong ties. 
In the case of the Madrid terrorist bombings on March 
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11th, 2003, the individuals behind the attack were able 
to procure knowledge about making explosive devices, 
hashish to trade for explosive materials, and the explo- 
sive material itself using their strong ties. Had they used 
weak ties which would have been less reliable, their plot 
may have been exposed and their intentions thwarted. 
Sinister plots are not the only example of a planning ac- 
tivity that can benefit from using strong ties to maintain 
confidentiality. Scientists may wish to forge collabora- 
tions requiring diverse expertise [|| , and in doing so they 
may wish to keep a competitive edge by not broadcasting 
their ideas over weak ties. Similar situations may arise 
in the formation of business alliances, where companies 
seek to complement their strengths through mergers, ac- 
quisitions, cross licensing of intellectual property, or joint 
ventures, but do not wish to leak their next steps to com- 
petitors. 

There are also processes which describe the contagion 
of new ideas and practices in which the credibility of 
information or the willingness to adopt an innovation re- 
quires independent confirmation from multiple sources. 
Unlike a 'simple' biological contagious agent carrying a 
disease, which can be transferred through a single contact 
between two individuals, ideas and opinions ('complex' 
agents) may need to be heard from multiple contacts be- 
fore being adopted 0. Whether one considers teenagers 
deciding to buy a new brand of jeans or farmers starting 
to plant a new type of corn, the decisive event may not 
be hearing about an innovation, but observing enough 
people participatin g to be convinced that the innovation 
should be adopted jl7 t lll | . 

The presence of closed triads enhances the probability 
that complex contagion can spread on a network. Social 
networks tend to have a much higher probability of closed 
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triads than the equivalent random networks |l2|, |20J . An 
intuitive reason is given by structural balance theory Q 
which states that ties tend to be transitive: if a node 
is connected to two other nodes (is a member of two 
diads), those two nodes are much more likely on aver- 
age to be connected than two randomly chosen nodes. 
Recently, it has also been shown that many real world 
networks, including social networks, contain overlapping 
k-cliques 0. Within a k-clique, each of the k nodes is 
connected to each of the other k nodes, forming a densely 
knit community containing ( 3 ) closed triads. Two cliques 
were considered overlapping if they shared k — 1 nodes, 
and the question was posed whether these overlapping 
cliques themselves form a network containing a fraction 
of the network (the network percolates). In contrast, in 
this paper, we are interested not in the overlap of cliques, 
but the strength of ties between individuals. A message 
can be passed between two communities, even if they 
share only one individual in common, as long as that 
individual has strong ties within both communities. 

Our results are as follows. Given the potential impor- 
tance of closed triads both in assembling varied exper- 
tise and in the diffusion of innovation, we first determine 
how they are linked together in observed social networks. 
We find that removing non-transitive ties from these so- 
cial networks shrinks the giant component, but does not 
break it up. These results show that social networks are 
composed of overlapping communities, with each com- 
munity providing strong ties, and the overlap providing 
a way to traverse the network using strong ties. Sec- 
ondly, we seek to quantify the impact this local struc- 
tural requirement has on the global properties of a net- 
work, such as the phase transition in the emergence of 
a giant component. To this end, we model a random 
graph constructed entirely of closed triads and compare 
its properties to that of an Erdos-Renyi graph with the 
same number of nodes and edges. We derive the result 
that the giant connected component occurs at the same 
average connectivity (average degree (k) = 1), but that 
it does not grow as quickly in the triad graph as the 
average connectivity increases further. Numerical sim- 
ulations reveal that the average shortest path is quite 
similar in both networks. Essentially, requiring transi- 
tive closure allows fewer nodes to be connected (since 
1 /3 of the links must be redundant rather than reaching 
out to connect additional nodes). However, the resulting 
connected component will have an average shortest path 
that scales logarithmically with the size of the graph, just 
as it would in an Erdos-Renyi graph. 



II. SOCIAL NETWORKS WITHOUT WEAK 
TIES 

In order to study the connectedness of social networks 
without weak ties, we analyzed two data sets. The first, 
and smaller, data set is the social network of the Club 
Nexus online community at Stanford in 2001 1]. Much 
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TABLE I: Distribution of connected components in online 
communities. 



like many later online social networking services, it al- 
lowed individuals to sign up and list their friends on the 
site. The 'buddy' lists were aggregated into a single social 
network of reciprocated links. Within a few months of 
its introduction, Club Nexus attracted over 2,000 under- 
graduates and graduates, together comprising more than 
10 percent of the total student population. The Club 
Nexus network is only a biased subset of the complete 
student social network because students had free choice 
of how many friends to list. Nevertheless, the data does 
provide a proxy of the true social network, from which 
one can derive interesting properties. For example, trian- 
gles are quite prevalent in this network, with a clustering 
coefficient of 0.17, which is 40 times greater than what it 
would be for an equivalent Erdos-Renyi random graph. 
The average distance between any two individuals is just 
4 hops. 

Adamic et al. 1] found that edges with high between- 
ness, where betweenness reflects the number of shortest 
paths that traverse the edge, tended to connect people 
with less similar profiles. These profiles included infor- 
mation about the student's year, field of study, person- 
ality, hobbies and other interests. The observation that 
ties of high betweenness lie between dissimilar individ- 
uals supports the hypothesis that weak ties bridge dif- 
ferent communities. Edges with high betweenness also 
tend to not be part of closed triads, because each edge 
in the triad provides a possible alternate path. In fact, 
a recently-devised clustering algorithm relies on identi- 
fying communities by removing edges that participate in 
fewest closed triads and longer loops[16|. It is therefore a 
concern that removing non-transitive ties from a network 
would tend to break it apart into disconnected communi- 
ties. This would mean that diverse expertise may not be 
reachable and new innovations may not flow throughout 
the network. 

In the case of the Club Nexus network, we can dismiss 
the concern, because the network is robust with respect 
to the removal of weak links, which account for 19% of 
all links. Rather than breaking up into many discon- 
nected communities, the network sheds some nodes and 
shrinks modestly. Most obviously, the 239 leaf nodes can- 



3 



not be part of triangles because they link to just one other 
node. They each become a disconnected component with 
the removal of weak ties, which is justified in this con- 
text because they are peripheral actors. Table [I] shows 
the distribution in size of the connected components for 
the original network and the network with weak links 
removed. Note that both networks have a giant compo- 
nent containing the majority of the nodes. The removal 
of weak ties does not separate communities of large size — 
the largest one is composed of just 6 nodes. The removal 
of weak ties does cause a slight increase in the the aver- 
age shortest path between reachable pairs. Although the 
fraction of reachable pairs drops from 72% to 51%, the 
average shortest path increases from 3.9 hops to 4.1. 

The next network we consider is the network of AOL 
Instant Messenger (AIM) links submitted to the web- 
site buddyzoo . com The system uses Buddy Lists to 
show users which buddies they have in common with 
their friends, to visualize their Buddy List, to compute 
shortest paths between screennames, and to show each 
user's prestige based on the PageRank [l4| measure ap- 
plied to the network. Our anonymized snapshot of the 
data is from 2004 and includes 140,181 users who submit- 
ted their buddy lists to the BuddyZoo service, as well as 
7,518,816 users who did not explicitly register with Bud- 
dyZoo but were found on the registered users' Buddy 
Lists. This is therefore a rather large social network. It 
was previously studied to determine whether direct links 
can be concealed in the network, for example to manip- 
ulate an online reputation mechanism |7j . In the context 
on BuddyZoo, this would mean that two people would 
remove each other from their Buddy Lists in an attempt 
to hide their connection. But unless they share no other 
'buddies' in common, they would still be linked as 'friends 
of friends' and arguably would have a more difficult time 
denying acquaintance. 9% of the users have only a sin- 
gle connection, and would disconnect themselves from 
the network if they were to remove it. Of the remaining 
pairs of users, only 19% could remove their direct link 
and be at least distance 3 from each other, while all oth- 
ers would remain friends of friends. This is equivalent to 
asking what percentage of the edges are parts of trian- 
gles, which is the question we are currently interested in. 

In order to determine the presence of strong ties, we 
consider only users who explicitly registered with Bud- 
dyZoo, but we allow an edge to be considered transitive 
if it is part of a closed triad that includes an unregistered 
user. This is because we know that two people share a 
contact, even if that contact did not register. We exclude 
shared contacts that have indegree greater than 1000, 
because those could be AIM bots (automated response 
programs). We do not include unregistered contacts in 
the network itself because their Buddy List information 
is incomplete. The degree distribution is highly skewed 
and there are many isolates in the network. On aver- 
age, each user is connected via a reciprocated tie to 6.83 
other registered BuddyZoo users. We require a tie to be 
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FIG. 1: The distribution of the strength of ties, measured as 
the number of triads each tie participates in. 




FIG. 2: The largest component of the reduction of the Bud- 
dyZoo network where each tie participates in at least 47 tri- 
ads. The triads themselves are not all shown — only the ties 
that share a threshold number of them. 



reciprocated, since it is possible for one AIM user to add 
someone to their buddy list without that person adding 
them in turn. 

As in the case of the Club Nexus social network, we 
find that removing weak ties does not have a dramatic 
effect on the BuddyZoo network. Although several com- 
munities containing a couple of dozen nodes do split off, 
the giant component shrinks modestly, from occupying 
88.9% of the graph to occupying 87.5% of it. The aver- 
age shortest path increases by a fraction of a hop from 
7.1 to 7.3. Usually any lengthening in the path decreases 
the probability of a successful transmission if the proba- 
bility that the message is transferred at each step is less 
than 1 However, we do not observe considerable 

lengthening of the average shortest path until we impose 
a higher threshold on tie strength. In order to consider 
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TABLE II: Distribution of connected components in the Bud- 
dyZoo AOL instant messenger community. A tie is considered 
weak if two users who list each other on their buddy lists do 
not list a third person in common. 




tie strength threshold 

FIG. 3: The size of the giant component as only ties of a 
minimum strength (measured in the number of triads it is a 
part of) are kept in the network. The inset shows the growth 
of the average shortest path between connected pairs. 



more restrictive requirements on tie strength, we vary the 
strength threshold as follows: rather considering any tie 
in a single closed triad to be strong, we require that it be 
part of at least j closed triads. Figure^shows the distri- 
bution of tie strengths, where the mean number of shared 
ties is 17.4 and the median is 13. Figure shows the 
largest component of nodes where each tie participates 
in at least 47 triads. There are several dense cliques, but 
the largest component is quite small - only 233 nodes. To 



investigate how rapidly the giant component shrinks and 
how much the average shortest distance changes, we con- 
sider reduced networks where only ties of above threshold 
strength, measured by the number of triads the tie par- 
ticipates in, are kept. Figure [3] shows the giant compo- 
nent size and average shortest path between all connected 
pairs as the threshold is increased from zero to 35 triads. 
We observe that the giant component shrinks gradually, 
indicating that a substantial portion of the network is 
spanned by ties of moderate strength. This would indi- 
cate that the network is composed of overlapping commu- 
nities rather than separate communities that are bridged 
by weak ties. What is more, removing weak ties does not 
separate large communities from one another. Rather, 
a few smaller communities and many isolates are spun 
off as the tie strength threshold is increased. Removing 
weak ties has an additional cost beyond isolating some 
individual nodes and smaller communities — it increases 
the average shortest path between reachable pairs. So 
even though the giant component is shrinking, we are 
removing the shortcuts that span it. The average short- 
est path more than doubles as we increase the threshold 
from 1 to 25. 

The strong tie robustness of the Club Nexus and Bud- 
dyZoo networks is encouraging, especially in compari- 
son to what one might expect in a Watts-Strogatz (WS) 
type small world model |20| or an Erdos-Renyi graph. In 
the WS model, the network is constructed from a lattice 
where each node is connected to k neighbors on each side. 
For k > 1, this means that each node participates in local 
closed triads. In the model, a fraction p of the links are 
rewired with one endpoint placed randomly among the 
nodes. It is the presence of these random links that gives 
the WS model a shortest path that scales logarithmically 
with the size of the graph. Such a link is unlikely to be 
part of triangle however, since the probability of any two 
nodes linking randomly is proportional to 1/iV in such a 
graph. Therefore, removing weak links in a WS model 
removes the shortcuts, leaving an average shortest path 
that scales linearly with the size of the graph. Assum- 
ing that nodes close together on the lattice share simi- 
lar information, one would need to make many hops in 
order to find novel information. In section Till CI we will 
show that the occurrence of strong ties in an Erdos-Renyi 
graph is unlikely unless the average degree increases with 
the number of nodes in the network. Therefore, remov- 
ing all edges that are not part of a triangle will isolate 
most of the nodes in random graphs where the average 
degree is constant or nearly constant with respect to the 
number of nodes. 



III. MODEL OF A RANDOM TRIANGLE 
GRAPH 

Given the results of the previous section, where we see 
a very high prevalence of transitive ties and a robustness 
of the network with respect to removal of weak ties, we 



5 



seek to answer the basic question of the cost of requiring 
all ties to be transitive. In order to do this we consider the 
very simplest model of a random graph where every edge 
between two nodes is part of at least one closed triad, and 
investigate some properties of the graph analytically. In 
essence, the graph is composed entirely of triangles, and 
we model this kind of graph by assigning links among 
any three randomly chosen nodes in the graph. Strictly 
speaking, for a graph with |V| = iV nodes, there are (^) 
possible combinations of nodes that can form a triangle. 
Each triangle forms with probability b, so that on average 
we randomly choose M = b X ( 3 ) triplets of nodes and 
link them with three edges. 

Note that our method of constructing transitive graphs 
is similar to a particular instance of the Newman [llj 
model for constructing highly clustered graphs. In the 
Newman clustered network model, one takes a bipartite 
network of individuals and groups. One then constructs 
a one-mode projection of the random graph by adding, 
with a given probability p, edges directly between indi- 
viduals who belong to the same group. However, unlike 
[Tlj . in our model the probability for nodes to connect 
to each other in the same group is 1, and the number of 
members in each group is constant at 3. 



A. Degree distribution 

We consider the degree distribution of the graph start- 
ing from the distribution of a node belonging to k closed 
triads. 

For each node u, there is a total of R — ( Ar 2 ~ 1 ) possible 
triangles which have u as one of the vertices. And, for 
each triple of vertices, the probability of being selected 
to have links in the graph is b. Let r m be the probability 
for a node belong to m chosen triples. Then 



R 



b m {l-b) 



R-m 



(1) 



On the other hand, we will now show that it is unlikely 
that our fixed node u is part of two triangles with an edge 
in common. Our node u has degree k if, for some m, node 
u is in m chosen triples on a total of k distinct nodes aside 
from u. It is straightforward to show that fc/2 < m < (^) ■ 
In fact, for even k <C N, most of the probability is in 
the case m = k/2, as we now show. For even k, the 
probability that u has degree k is the probability that u 
is in exactly k/2 chosen triples, adjusted for collisions of 
edges. Collisions affect the probability of degree k in two 
ways — u may be in exactly m = k/2 triples but a collision 
reduces the contribution to the probability of degree k, or 
u may be in m > k/2 chosen triples but collisions increase 
the contribution to the probability that the degree is k. 

Conditioned on u falling in exactly m chosen triples, 
all sets of m triples are equally likely. There are ( R ) = 



8 ^ 2™m\ ) P 0SS1 ble sets of m triples. Next, we want 



count the number of sets of m triples involving exactly j 
neighbors of u, for j < 2m. We can pick the j neighbors 
as a set in ^J 1 ) ways, but then we need to assign roles to 
the j neighbors based on collision multiplicity. For exam- 
ple, suppose 4 triples among five neighbors A, B, C, D, E 
of u might be {u, A, B}, {u, A, C}, {u, A, D}, {u, B, E}. 
We can choose A, B, C, D, E as a set; pick an element for 
the role of A (that appears three times) in 5 ways; given 
that, pick an element for the role of B in 4 ways; then 
E in 3 ways, and the remaining elements take the inter- 
changeable roles of C and D, for a total of 5 • 4 • 3 < 5! 
orderings). 

For us, a crude bound for the orderings of roles will 
suffice. There are at most 2m — j collisions counting 
multiplicities, and so at most 2m — j neighbors of u that 
can be in more than one triple — play a non-trivial role. 
There are at most 2m — j roles. So the number of ways to 
assign non-trivial roles is at most (2m — j) 2m ~^ ■ So the 
number of sets of m triples involving exactly j neighbors 
of u is at most ( N J 1 )(2m - j) 2m ~ J . Thus the ratio of 
these to the number of sets of m disjoint triples is 



( JV 7 1 )(2m-j) 2 ™2 < fW(2m- 3 ) 2m -i2 m ml 



r) 



< o 



j\N 2m 
((2m- j)/N) 2m -n m m\ 



We are intereseted in the case 2m — j > 1. If m and j 
are constants, then we can ignore 2 m m!/j!, and we get 

( N ; 1 )(2m-j) 2m -^ ( {{2m~ j)/N) 2m -n m m\ 

< 0{l/N). 

By choosing the appropriately small probability b of 
choosing a triple, we may assume that m and j are much 
smaller than N. But we cannot necessarily assume m and 
j are constants; for example, we may have ml comparable 
to N. We now consider the case where j or m grows 
(slowly) with N, and where iV is sufficiently large. If 

m < j, then 2 m m\/j\ < (^) _1 < 1. It follows that 



(N-l 



i ){2m-j) 2m -^ ^ o f({2m- j)/N) 2m -i2 m m\ 



< 0(((2m-j)/N) 2m -J) 

< OiN- 1 ). 



On the other hand, if m > j, then 2m — j > m, so 
( N T 1 )(2m-j) 2 ^ 



< o 



{{2m- j)/N) 2m -i2 m m\ 



to 



< 0{({2m-j)/N) 2m -i{2m) r 

< 0({2m{2m-j)/N) 2m -i) . 
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If 2m- j = 1, this is 0{2m/N) < N~ 1+ °^\ H2m-j > 1, 
then, since we may assume that 2m <C yN, we have 



( jV - 1 )(2m- J ) 2 ^ 

\rnl 



< 0((2m(2m-j)/N) 2m ~ : >) 

< O (((2m - j)/VN) 2m -^ 

< o(((2m-j)7iV) (2m ^ )/2 ) 



This is O ((2m - jf/N) < N- 1+0 W. 

We conclude that the effect of collisions is small in any 
case. Thus wc get the probability of u having degree k is 

f (^b^il-b^-i ±N- 1+ °^ if k is even , 

Pk = { y,,.,,, (2) 



if k is odd 



After ignoring the additive amount ±N the 
corresponding generating function is given by 

= E (f) &fe (! " = [6« 2 + 1 - &]* (3) 

The average degree (fc) is then given by: 

(k) = G' (l)=b(N-l)(N-2) (4) 

And thus, we have the relationship between average 
degree (k) and the probability of any three nodes being 
connected by a triangle b: 



b = 



(k) 



(5) 



(N- l)(N-2) 
When (k)=0(l),b=0(^). 



B. Accidental triangles and the clustering 
coefficient 



We should notice that in our model, the expected num- 
ber of triangles in the network is not exactly b x (^). 
There is the possibility of forming an "accidental" trian- 
gle, which can occur when the pairs of nodes a and b, b 
and c, and a and c are linked, but the triangle a, 6, c was 
not among the b x (^) initially chosen triangles. The 
probability bi of this occurring is the probability that no 
triangle was intentionally formed between the a, b, and 
c: 1 — b times the probability that each of the three edges 
does occur in a triangle other than a, 6, c. 



b' = (I - b)[l - (1 ~ b)( N ~V} 3 



(6) 



In this way, we know that the total expected number 



Thus, the ratio between the actual number of triangles 
in the graph and the input number of triangles is: 



(1-6)[1-(1-6)( JV " 3 )] 3 



(7) 



However, b' is very small compared with 6, when the 
average degree of a node in the graph is a constant inde- 
pendent of the growth of the total number of nodes N. 
Since we have shown that b = 0{-^), then it is not hard 
to see that the ratio of the probability for any three nodes 
to be part of an accidental triangle and the probability 
for them to be a triangle that is constructed by randomly 
choosing groups is: 



T) 



(l-6)[l-(l-6)( w - 3 )] 3 







(8) 



Thus, we can see that when A^ is large, and the aver- 
age degree (k) is independent of N, then the chance of 
forming an accidental triangle is quite small compared to 
the triangles randomly drawn in constructing the model. 
Figure^shows the relation between b' and average degree 
(k). 




of triangles in this graph is a x 



where a = b + b' . 



FIG. 4: The ratio of the number of accidentally formed trian- 
gles to the number randomly chosen by the model. For fixed 
average degree and increasing number of nodes, the ratio of 
accidentally formed triangles drops as 1/iV. 



In Figure[S]we show three instances of a randomly gen- 
erated graph of triangles. Each graph has 1,000 nodes, 
but we form different numbers of triangles. Even though 
a giant component exists for each graph, it is only once 
the number of triangles equals the number of nodes that 
we observe a few random triangles forming. Therefore 
the formation of accidental triangles does not have a sub- 
stantial effect on the derivations below. 

The clustering coefficient C is a measure of the preva- 
lence of closed triads in a network 0, [2(j. The ex- 
pectation of the total number of connected triples of 
nodes (open and closed triads) in the graph is N tr ipie = 
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(a) N = 1000, M = 200 (b) N = 1000, M = 300 (c) N = 1000, M = 500 



FIG. 5: Examples of triangle graphs with 1000 nodes with varying numbers of triangles M . Accidental triangles are marked 
with bold lines. 



N x CzjPki an( i the number of closed triads is ~ 
bx iV( 3) since the number of accidental triangles is small. 
Thus the clustering coefficient is: 

■L* triple 

„ 3b (a) 
= 0(1) 

We can see that when TV is large, the clustering coeffi- 
cient of our graph is: 

C = 0(1) (9) 

which is significantly larger than the 0(N~ 1 ) clustering 
coefficient in an Erdos-Renyi Random graph. For many 
types of real world networks, it has been shown that C = 
0(1) ^3 1 so h is of interest to see how removing weak 
ties in real networks changes the clustering coefficients. 



C. Phase transition and the giant component 

For the derivation of the phase transition and size of 
giant component, we loosely follow the generating func- 
tion methods for clustered graphs in [lj- The phase 
transition is also known as the percolation threshold - 
the average degree at which a finite fraction of the net- 
work is connected, forming a giant component. In Part 
A, we have given r m , the probability for a node belong 
to m triangles. Thus, averaging over all individuals and 
triangles, we have the mean number of triangles a node 
belongs to: fi — J2 m mr ™- 

The probability of having two edges within the triangle 



is 1, and the probability of having any other number is 
0. Therefore, the generating function of the number of 
edges for each node within a triangle is 

h(z) = z 2 (10) 

Furthermore, for a node A in the graph, the total num- 
ber of other nodes in the whole graph that it is connected 
to by virtue of belonging to triangles is generated by: 

00 

Go(z) = r m (h(z)) m (11) 

m=0 

where r m is the probability for a node to belong to m 
groups as we defined before. This is also the generating 
function of the distribution of the number of nodes one 
step away from node A. 

The generating function of the distribution of the num- 
ber of nodes two steps away from A is Gq(G±(z)), where 
G\(z) is the generating function for the distribution of 
the number of neighbors of a node arrived at by follow- 
ing an edge (excluding the edge that was used to arrive 
at the node): 

00 

G 1 (z) = ^ 1 Y,™ m (h(z)) m - 1 (12) 

m=0 

The necessary and sufficient condition for a giant com- 
ponent to exist, is when, averaging over all the nodes in 
the graph, the number of nodes two steps away exceeds 
the number of nodes one step away |13| . which can be 
expressed as: 

[d z (G (G 1 (z)) - G (z))] z=1 >0 (13) 
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Thus, we get the condition for the existence of a giant 
component in this graph: 



((/i-^mtm-lK^" 1 - 2 ).^))!^! > 1 
m=0 

oo 

2/i _1 m(m — l)r m > 1 

m=0 

R{R-l)b 1 
#6 > 2 

After simplifying the above equation, the condition is 

1 



b > 



N 2 - 3N 



(14) 



Since we will compare this graph with an Erdos-Renyi 
random graph with the same average degree (fc), we ex- 
press the condition for the existence of giant component 
in terms of the average degree given by Equation 
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FIG. 6: Comparison of numerical simulations with analyti- 
cal solutions for the fraction of the network occupied by the 
giant component of a 10,000 node triangle graph and the cor- 
responding Erdos-Renyi graph 



(A) > 1 + 



N 2 - 3N 



(15) 



As N — ► oo, the condition is (k) > 1. An interest- 
ing point is that this is exactly where the phase tran- 
sition occurs in an Erdos-Renyi graph. Therefore, the 
requirement that all edges be transitive does not delay 
the appearance of the giant component. It does however 
have a tempering effect on the rate of growth of the giant 
component as we will see below. 

When a giant component exists in the graph and the 
probability for a node to whom A is connected to not 
belong to it is s, the size of the giant component is given 
by: 



S = l-Go(so) 

OO 

m=0 

= l-(bs* + l-bf 
where sq is the solution of the function: 

s = 

OO 

= fx- 1 ]T mr m ( S 2 ) m - 

m— 

= (bs 2 + 1 - b)*- 1 



(16) 
(17) 

(18) 

(19) 
(20) 

(21) 



As we have assumed 5* > 0, we know that s must be 
some value larger than and smaller than 1, and thus 
s = 1 is a trivial solution of the function. 

We compare the solution sq to numerical simulations 
of networks of random triangles. Each network contains 
N = 10, 000 nodes, and we select M random triangles to 
connect from the N nodes. For each value of M we gener- 



ate 50 random networks and average the size of the giant 
component. The results, shown in Figure |B] show excel- 
lent agreement between the analytical prediction and the 
numerical simulation. For comparison, we show both the 
numerical prediction and analytical result for the size of 
the giant component in an Erdos-Renyi random graph 
with the same number of nodes and edges. The size of 
the giant component in the Erdos-Renyi graph is given by 
the solution s to the equation s = 1 — exp(— (k)s). ^From 
the hgure, we can see that as average degree grows, the 
phase transitions of the transitive graph and the random 
graph occur at the same time, while the size of giant com- 
ponent of the Erdos-Renyi graph grows more quickly as 
we increase the average degree. An intuitive explanation 
is that in an Erdos-Renyi graph one need not expend a 
'closure' edge to close a triad. Rather, that edge can be 
used to connect a disconnected node or small component 
to the giant component. 

The fact that the phase transition occurs at the same 
average degree for both the Erdos-Renyi and transitive 
network shows that the requirement of transitivity does 
not result in a need for increased average connectivity in 
order for the giant component to form. Note that the 
phase transition in our model, where all edges are the 
result of the addition of triangles, is quite different from 
what it is in a graph that would result from taking a 
simple Erdos-Renyi graph and removing all edges that 
do not fall within a triangle. In the Erdos-Renyi graph 
with non-transitive edge removal the percolation thresh- 
old occurs at a degree that scales as N*. 

This condition for the giant component in an Erdos- 
Renyi graph with weak ties removed can be derived as 
follows. A giant component of strong tries forms when, 
after arriving at an arbitrary triangle T, the expected 
value of the number other adjacent triangles that one 
could "move to" is equal to 1. The probability that there 
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is a triangle T 1 adjacent to T that is not the triangle 
from which we reached T is given by 2( A '~ 5 )p 3 . There 



the average shortest path for reachable pairs, but those 
pairs are fewer in number. 



arc 



choices for the vertices in T' not shared with 
T, and two choices of the vertex shared by T and T' 
(excluding the vertex of T that is shared with the tri- 
angle we arrived from) . p = (k) /N is the probability 
that any two vertices in an Erdos-Renyi graph share an 
edge. Thus when N is large, the average degree at the 
phase transition is (k) = TV 1 / 3 . In several real world net- 
works the average degree was found to vary as N@ where 
< (3 < 0.3 ^j- But in a random network, this density 
falls short of the N 1 / 3 necessary to make the acciden- 
tal occurrence of closed triads (and therefore strong ties) 
high enough for the network to percolate. 

If one further requires that the triangles overlap not 
just in one node but in two, as in the percolation of k- 
cliques [U, the phase transition occurs at a critical av- 

fe-2 

erage degree that grows as N k ~ 1 , with k = 3. This 
means that the average degree has to grow in linear pro- 
portion to N in order for a giant component to form. 
Together, these two results show that the Erdos-Renyi 
random graph typically does not contain sufficiently nu- 
merous strong ties to percolate. But as we have shown 
in section^ real world social networks do contain many 
strong ties that percolate. This can be intuitively ex- 
plained by the observation that new social ties typically 
form in the context of geographical and sociocultural set- 
tings |l9j . In these contexts it is natural that the ties tend 
to form closed triads rather than being added indepen- 
dently, as they are in Erdos-Renyi random graphs. 



IV. AVERAGE SHORTEST PATH 

Exact results for the average shortest path are difficult 
to derive even for a random graph. We therefore used 
numerical simulations to measure the average shortest 
path between all reachable nodes as we increase the size 
of the network. We selected a value of the average node 
degree where the giant component existed, but did not 
take up all of the graph. At our chosen value, M — 
0.5iV, there are twice as many triangles as nodes. This 
constant proportion of triangles to nodes means that b, 
the probability of any triple of nodes being connected, 
falls as 1/iV 2 . 

At M = 0.5-/V, the giant component occupies 76% of 
the nodes, while in the equivalent random graph it takes 
up 94% of the nodes. This makes it difficult to directly 
compare the two networks, since the average shortest 
path is measured between reachable pairs, and the Erdos- 
Renyi graph has more of them. Figure [7| shows that the 
average shortest path is actually shorter in the triangle 
graph. This may be explained by the fact that there are 
fewer nodes in the giant component but a greater density 
of links. Once we consider the average shortest path rela- 
tive to the size of the giant component, the curves become 
nearly identical for both networks. This shows that the 
requirement of triadic closure does not negatively impact 
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FIG. 7: Numerical comparison of the average shortest path in 
triangle graphs and Erdos-Renyi graphs with the same num- 
ber of nodes and edges. The inset shows the average shortest 
path as a function of the size of the giant component rather 
than the total number of nodes. 



V. CONCLUSIONS AND FUTURE WORK 

In this paper we study the connectivity of strong ties 
in networks, where strong ties are defined as belonging 
to closed triads. We find that two real world social net- 
works are robust with respect to removal of weak links, 
in the sense that there remains a giant component that is 
smaller but still occupies a majority of the graph. We also 
find empirically that the removal of weak links lengthens 
the average shortest path modestly. In comparison, the 
removal of weak links in an WS small world network or 
an Erdos-Renyi graph would isolate the vast majority of 
nodes. It is the high clustering of social networks that 
allows them to transmit or gather information via strong 
ties. 

We also pose a basic question, which is the cost paid 
for the requirement of transitive ties in terms of the size 
of the giant component and the length of the average 
shortest path. We consider the simplest random graph 
model consisting entirely of closed triads and compare it 
to a network where the links are randomly rewired. We 
find that the giant component occurs at the same point — 
when the average node degree equals I. However, past 
the phase transition, the giant component in the graph 
of closed triads grows more slowly than it does in the 
random network. We further examine the dependence of 
the average shortest path with the size of the network and 
find it to be almost identical for reachable pairs in both 
the triangle graph and the equivalent random network. 

An unanswered question is whether more sophisticated 
models of social structure 0, U capture the phe- 
nomenon of strong ties that can be linked together to 
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span an entire network. In particular, in future work we 
are interested in examining the strong tie properties of 
social networks where the edge probabilities depend on 
the hierarchial organization of underlying social dimen- 
sions. 
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